MIRAGE: An Iterative MapReduce based FrequentSubgraph Mining Algorithm

نویسندگان

  • Mansurul Bhuiyan
  • Mohammad Al Hasan
چکیده

Frequent subgraph mining (FSM) is an important task for exploratory data analysis on graph data. Over the years, many algorithms have been proposed to solve this task. These algorithms assume that the data structure of the mining task is small enough to fit in the main memory of a computer. However, as the real-world graph data grows, both in size and quantity, such an assumption does not hold any longer. To overcome this, some graph database-centric methods have been proposed in recent years for solving FSM; however, a distributed solution using MapReduce paradigm has not been explored extensively. Since, MapReduce is becoming the defacto paradigm for computation on massive data, an efficient FSM algorithm on this paradigm is of huge demand. In this work, we propose a frequent subgraph mining algorithm called MIRAGE which uses an iterative MapReduce based framework. MIRAGE is complete as it returns all the frequent subgraphs for a given user-defined support, and it is efficient as it applies all the optimizations that the latest FSM algorithms adopt. Our experiments with real life and large synthetic datasets validate the effectiveness of MIRAGE for mining frequent subgraphs from large graph datasets. The source code of MIRAGE is available from www.cs.iupui.edu/∼alhasan/software/

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining Uncertain Sequential Patterns in Iterative MapReduce

This paper proposes a sequential pattern mining (SPM) algorithm in large scale uncertain databases. Uncertain sequence databases are widely used to model inaccurate or imprecise timestamped data in many real applications, where traditional SPM algorithms are inapplicable because of data uncertainty and scalability. In this paper, we develop an efficient approach to manage data uncertainty in SP...

متن کامل

Research of Decision Tree on YARN Using MapReduce and Spark

Decision tree is one of the most widely used classification methods. For massive data processing, MapReduce is a good choice. Whereas, MapReduce is not suitable for iterative algorithms. The programming model of Spark is proposed as a memory-based framework that is fit for iterative algorithms and interactive data mining. In this paper, C4.5 is implemented on both MapReduce and Spark. The resul...

متن کامل

iMapReduce: Incremental MapReduce for Mining Evolving Big Data

As new data and updates are constantly arriving, the results of data mining applications become stale and obsolete over time. Incremental processing is a promising approach to refreshing mining results. It utilizes previously saved states to avoid the expense of re-computation from scratch. In this paper, we propose i2MapReduce, a novel incremental processing extension to MapReduce, the most wi...

متن کامل

Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduce Framework

While many existing formal concept analysis algorithms are efficient, they are typically unsuitable for distributed implementation. Taking the MapReduce (MR) framework as our inspiration we introduce a distributed approach for performing formal concept mining. Our method has its novelty in that we use a light-weight MapReduce runtime called Twister which is better suited to iterative algorithms...

متن کامل

An Efficient Implementation of Chronic Inflation based Power Iterative Clustering Algorithm

---------------------------------------------------------------------***--------------------------------------------------------------------Abstract -In recent days the software plays a major role in application writing areas such that big data, it is used to store the information which is efficiently processed. Managing large datasets is one of the challenging task and also time consumption is...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1307.5894  شماره 

صفحات  -

تاریخ انتشار 2013